EFA for EPSY 905

Author

Dasha Yermol

Load R packages

library(readxl) #read in xlsx file
library(tidyverse) #data cleaning and manipulation
library(psych) #for EFA tests
library(lavaan) #to conduct EFA
library(lavaanPlot) #to plot EFA objects
library(psych) #to conduct EFA

Read in the data

data <- read_xlsx("saq_data.xlsx")
df <- data.frame(data)

Any missing data? -- NO! :)

table(is.na(df))

FALSE 
59133 

Assumptions of EFA:

  1. Relationship between variables and latent factors are linear
  2. Some underlying latent structure exists
  3. Linear coefficients are same across participants
  4. Linear relationship between the variables and underlying normal distribution
  5. Variables are ideally continuous, and if they are continuous, should have multivariate normality

Items 1-3 are conceptual assumptions of EFA. We will test assumptions 4 & 5 below

Assumption 4: Linearity and underlying normal distribution

Histograms of variables – From these, it appears that most variables are not normally distributed, but this not surprising considering that we are working with ordinal data.

library(patchwork) #package that puts multiple figures side by side

create_histogram <- function(data, col) {
  ggplot(data, aes(x = .data[[col]])) +
    geom_histogram(binwidth = 1, fill = "cornflowerblue", color = "black") +
    labs(title = paste("Histogram of", col),
         x = col, y = "Frequency") +
    theme(aspect.ratio = 1/5)
}


histograms_ggplot <- lapply(colnames(df), function(col) {
  create_histogram(df, col)
})

combined_histograms <- wrap_plots(histograms_ggplot, ncol = 4)

combined_histograms

Boxplot of variables – another way to show distributions of the variables

boxplot(df, notch = TRUE, boxfill= "cornflowerblue", whiskcol = "firebrick", pch=16, outcol = "firebrick")

Pearson correlations between variables; it appears that we mostly have positive correlations between variables

library(gpairs) #a package to visualize correlations between variables
suppressWarnings(corrgram(df))

Looking at univariate skewness and kurtosis

According to Curran et al. (1996), univariate skew greater than 2.0 or kurtosis greater than 7.0 would indicate “severe univariate nonnormality”. We did not have any concerning univariate skew or kurtosis values. This is reassuring, however, these tests are intended for continuous variables, so this finding should be interpreted with caution.

describe(df)
    vars    n mean   sd median trimmed  mad min max range  skew kurtosis   se
q01    1 2571 2.37 0.83      2    2.34 0.00   1   5     4  0.65     0.61 0.02
q02    2 2571 1.62 0.85      1    1.46 0.00   1   5     4  1.49     2.04 0.02
q03    3 2571 2.59 1.08      3    2.57 1.48   1   5     4  0.09    -0.78 0.02
q04    4 2571 2.79 0.95      3    2.74 1.48   1   5     4  0.39    -0.29 0.02
q05    5 2571 2.72 0.96      3    2.68 1.48   1   5     4  0.46    -0.44 0.02
q06    6 2571 2.23 1.12      2    2.09 1.48   1   5     4  0.93     0.15 0.02
q07    7 2571 2.92 1.10      3    2.89 1.48   1   5     4  0.20    -0.85 0.02
q08    8 2571 2.24 0.87      2    2.15 0.00   1   5     4  1.05     1.48 0.02
q09    9 2571 2.85 1.26      3    2.83 1.48   1   5     4 -0.06    -1.15 0.02
q10   10 2571 2.28 0.88      2    2.21 0.00   1   5     4  0.83     0.59 0.02
q11   11 2571 2.26 0.88      2    2.18 0.00   1   5     4  0.84     0.90 0.02
q12   12 2571 3.16 0.92      3    3.12 1.48   1   5     4  0.18    -0.21 0.02
q13   13 2571 2.45 0.95      2    2.40 1.48   1   5     4  0.61     0.04 0.02
q14   14 2571 2.88 1.00      3    2.84 1.48   1   5     4  0.29    -0.36 0.02
q15   15 2571 2.77 1.01      3    2.72 1.48   1   5     4  0.43    -0.45 0.02
q16   16 2571 2.88 0.92      3    2.82 1.48   1   5     4  0.40    -0.13 0.02
q17   17 2571 2.47 0.88      2    2.40 0.00   1   5     4  0.76     0.49 0.02
q18   18 2571 2.57 1.05      2    2.51 1.48   1   5     4  0.48    -0.23 0.02
q19   19 2571 2.29 1.10      2    2.21 1.48   1   5     4  0.48    -0.72 0.02
q20   20 2571 3.62 1.04      4    3.68 1.48   1   5     4 -0.39    -0.65 0.02
q21   21 2571 3.17 0.98      3    3.12 1.48   1   5     4  0.12    -0.78 0.02
q22   22 2571 2.89 1.04      3    2.93 1.48   1   5     4 -0.07    -0.68 0.02
q23   23 2571 3.43 1.04      4    3.49 1.48   1   5     4 -0.59    -0.16 0.02

Summary of Assumption #4 (Linear relationship between the variables and underlying normal distribution) checking: Most of the variables are not normally distributed. However, this is typical for ordinal data. No concerns about univariate skewness or kurtosis (but this was done using tests meant for continuous variables, so these findings should be interpreted with caution).

Assumption 5: Continuous Variables & Multivariate Normality

Mardia’s multivariate test for skewness and kurtosis.

Our data is positively skewed and leptokurtic (pointyness)

  • Multivariate skew = 20.74 (p < 0.001)
  • Multivariate kurtosis = 685.61 (p < 0.001)
  • According to Bentler (2005), multivariate kurtosis > 3.0 to 5.0 might bias factor analysis results
library(QuantPsyc) #package to test for Mardia's multivariate skew and kurtosis
mardia_results <- mult.norm(df)

mardia_results$mult.test
          Beta-hat      kappa p-val
Skewness  20.74316 8888.44605     0
Kurtosis 685.61377   82.69539     0

Plotting Mardia’s skewness

Call: mardia(x = df, na.rm = TRUE, plot = TRUE)

Mardia tests of multivariate skew and kurtosis
Use describe(x) the to get univariate tests
n.obs = 2571   num.vars =  23 
b1p =  20.74   skew =  8888.45  with probability  <=  0
 small sample skew =  8899.68  with probability <=  0
b2p =  685.61   kurtosis =  82.7  with probability <=  0

Robust Mahalanobis Distance

Robust MD is more appropriate for ordinal data because of its robustness to outliers and lack of assumptions about the underlying distribution (i.e., better suited for non-normally distributed data with potential outliers)

Results (see below) of RobustMD show 10 outliers, which we will remove before starting EFA

library(faoutlier) #for robustMD() function
out = robustMD(df, method = "mcd")
print(out)
          mah p  sig
1117 177.6457 0 ****
84   161.6099 0 ****
285  130.2393 0 ****
2051 130.0928 0 ****
1278 120.8060 0 ****
678  117.7988 0 ****
808  116.8792 0 ****
1285 107.4826 0 ****
680  106.5546 0 ****
346  100.9642 0 ****

Plotting Robust MD

plot(out, type = 'qqplot')

Removing outliers (based on Robust MD Results)

#rows with outliers
outlier_indices <- c(1117, 84, 2051, 285, 1278, 678, 808, 1285, 680, 346)

#remove outliers
df_clean <- df %>%
  filter(!row_number() %in% outlier_indices)

Summary of Assumption #5 (Variables are ideally continuous, and if they are continuous, should have multivariate normality) checking: Both Mardia’s skewness/kurtosis tests and the Mahalanobis Distance test are intended for continuous variables. Since our data is ordinal, Mardia’s skewness/kurtosis results make sense because thety suggest that our data are not multivariate normal. As a result, we should proceed with a polychoric matrix so not to bias EFA results. However, to accommodate our ordinal data, we used Robust Mahalanobis Distance (instead of Mahalanobis Distance, which is intended for continuous variables) to determine multivariate outliers in our data. We removed 10 multivariate outliers in our data.

TO NOTE: It is often not appropriate to pretend that categorical variables are continuous (Flora et al., 2012). Since we are working with ordinal/categorical data, we will proceed with a polychoric matrix.Polychoric correlations assume that a normally distributed continuous, but unobservable, latent variable underlies the observed ordinal variable. Polychoric correlations are a maximum likelihood estimate of the Pearson correlations for those underlying normally distributed continuous variables (Basto & Pereira, 2012).

  • Pearson correlations assume normality, which requires continuous scores. Thus, categorical scores are not, by definition, normally distributed (Bandalos, 2018; Hancock & Liu, 2012; Puth et al., 2015; Walsh, 1996).

Converting to Polychoric Matrix

output_poly <- polychoric(df_clean)

#extract polychoric matrix from output object
poly = output_poly$rho

#preview of polychoric matrix
head(poly)
           q01        q02        q03        q04        q05        q06
q01  1.0000000 -0.1332690 -0.3795848  0.4880221  0.4612646  0.2437908
q02 -0.1332690  1.0000000  0.4085532 -0.1543015 -0.1516755 -0.1025360
q03 -0.3795848  0.4085532  1.0000000 -0.4215101 -0.3370588 -0.2457773
q04  0.4880221 -0.1543015 -0.4215101  1.0000000  0.4603872  0.3096884
q05  0.4612646 -0.1516755 -0.3370588  0.4603872  1.0000000  0.2948085
q06  0.2437908 -0.1025360 -0.2457773  0.3096884  0.2948085  1.0000000
           q07         q08        q09        q10        q11        q12
q01  0.3406471  0.38772259 -0.1003074  0.2423981  0.4168383  0.3900703
q02 -0.2033396 -0.05290357  0.3955990 -0.1031174 -0.1933953 -0.2660150
q03 -0.4191982 -0.28861557  0.3247646 -0.2040251 -0.3971147 -0.4562483
q04  0.4481132  0.39817350 -0.1389703  0.2356322  0.4142662  0.4898198
q05  0.3737677  0.31855830 -0.1021172  0.2893173  0.3451599  0.3934881
q06  0.5840649  0.27911043 -0.1287985  0.3849408  0.3952384  0.3527046
           q13        q14        q15        q16        q17        q18
q01  0.3953091  0.3911645  0.2684441  0.5504146  0.4258216  0.4010071
q02 -0.2032315 -0.2202330 -0.2188470 -0.2254519 -0.1232949 -0.2178279
q03 -0.3554002 -0.4133536 -0.3408011 -0.4672952 -0.3643531 -0.4170851
q04  0.3743790  0.3936631  0.3674473  0.4566244  0.4229458  0.4249714
q05  0.3368028  0.3581356  0.2835309  0.4438527  0.3593412  0.3663981
q06  0.5105541  0.4584889  0.4067497  0.2761093  0.3277384  0.5667686
           q19        q20        q21        q22          q23
q01 -0.2172947  0.2459422  0.3673493 -0.1095262 -0.006225213
q02  0.2713114 -0.2674744 -0.2672002  0.2920950  0.111845276
q03  0.3890594 -0.3667907 -0.4618645  0.2250893  0.153367423
q04 -0.2155679  0.2795362  0.4490687 -0.1089267 -0.028711775
q05 -0.1944819  0.2322192  0.3828512 -0.1409451 -0.047309978
q06 -0.1968400  0.1122415  0.3094600 -0.1912809 -0.087463934

Is EFA Appropriate?

Ideally, we will have many correlations above 0.3, which is appropriate for EFA. The correlation matrix can be visually scanned to ensure that there are several coefficients ≥ .30 (Hair et al., 2019; Tabachnick & Fidell, 2019).

Polychoric Correlation Graph (orange = pos. correlation, green = negative correlation, lines increase in thickness as correlation strength increases)

library(qgraph) #package to create correlation graph

qgraph(poly, cut = 0.3, details = TRUE, posCol = "orangered", negCol = "green", labels=names(df_clean))

Another Correlation Plot

corPlot(poly, diag = F, zlim = c(.3, 1), upper = F, numbers = TRUE, cex.axis = 0.5)

The figures show that most correlations are above 0.3, which is reassuring, since that’s ideal for EFA. :)

KMO: Kaiser-Meyer-Olkin Factor Adequacy

  • KMO is used to measure sampling adequacy (Kaiser, 1974) and to ensure that there is enough shared variance. In other words, the KMO indicates whether variables are likely to be factorizable (values closer to 1 means that the variables are more suitable for FA).
KMO(poly)
Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = poly)
Overall MSA =  0.93
MSA for each item = 
 q01  q02  q03  q04  q05  q06  q07  q08  q09  q10  q11  q12  q13  q14  q15  q16 
0.93 0.88 0.95 0.95 0.96 0.88 0.93 0.88 0.84 0.95 0.90 0.95 0.95 0.97 0.94 0.93 
 q17  q18  q19  q20  q21  q22  q23 
0.93 0.95 0.94 0.89 0.93 0.88 0.75 

Most of the KMO values are above 0.85, which is ideal! :)

Bartlett’s Test of Sphericity

  • Bartlett’s Test of Sphericity is used to ensure that the correlations between variables are significantly different from zero.
cortest.bartlett(poly, n = 2561)
$chisq
[1] 24379.88

$p.value
[1] 0

$df
[1] 253

Bartlett’s Test of Sphericity results indicate that we reject the hypothesis that the polychoric correlation matrix is an identiy matrix (χ² = 24,379.88, p < 0.001).

Determining the Number of Factors

Parallel Analysis

fa.parallel(poly, n.obs = 2561, fa = "pc", n.iter = 500) #n.iter = 500 means 500 simulated analyses to perform

Parallel analysis suggests that the number of factors =  NA  and the number of components =  4 

Parallel Analysis suggests 4 factors

MAP: Minimal Average Partials

  • MAP is calculated based on the partial correlations between observed variables and the remaining variables in the dataset after controlling for the latent factors. The minimal average of these partials across all variables is then computed.
  • A higher MAP value indicates better model fit, suggesting that the latent factors explain a larger proportion of the variance in the observed variables. Conversely, a lower MAP value indicates poorer model fit, suggesting that the latent factors do not adequately capture the relationships among the observed variables.
VSS(poly, rotate = "promax", fm = "pc", plot = TRUE, n.obs = 2561)


Very Simple Structure
Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm, 
    n.obs = n.obs, plot = plot, title = title, use = use, cor = cor)
VSS complexity 1 achieves a maximimum of 0.83  with  1  factors
VSS complexity 2 achieves a maximimum of 0.88  with  2  factors

The Velicer MAP achieves a minimum of 0.01  with  4  factors 
Warning in min(x$vss.stats[, "BIC"], na.rm = TRUE): no non-missing arguments to
min; returning Inf

BIC achieves a minimum of  Inf  with    factors
Warning in min(x$vss.stats[, "SABIC"], na.rm = TRUE): no non-missing arguments
to min; returning Inf
Sample Size adjusted BIC achieves a minimum of  Inf  with    factors

Statistics by number of factors 
  vss1 vss2   map dof chisq prob sqresid  fit RMSEA BIC SABIC complex eChisq
1 0.83 0.00 0.015   0    NA   NA    13.4 0.83    NA  NA    NA      NA     NA
2 0.80 0.88 0.015   0    NA   NA     9.9 0.88    NA  NA    NA      NA     NA
3 0.73 0.86 0.016   0    NA   NA     8.0 0.90    NA  NA    NA      NA     NA
4 0.76 0.86 0.015   0    NA   NA     6.5 0.92    NA  NA    NA      NA     NA
5 0.77 0.86 0.017   0    NA   NA     5.5 0.93    NA  NA    NA      NA     NA
6 0.65 0.84 0.021   0    NA   NA     4.8 0.94    NA  NA    NA      NA     NA
7 0.76 0.86 0.026   0    NA   NA     4.2 0.95    NA  NA    NA      NA     NA
8 0.78 0.86 0.032   0    NA   NA     3.6 0.95    NA  NA    NA      NA     NA
  SRMR eCRMS eBIC
1   NA    NA   NA
2   NA    NA   NA
3   NA    NA   NA
4   NA    NA   NA
5   NA    NA   NA
6   NA    NA   NA
7   NA    NA   NA
8   NA    NA   NA
# note: promax is a type of oblique rotation, which allows the factors to be correlated

The Velicer MAP results achieved a minimum of 0.01 with 4 factors

Scree Plot

scree(poly, pc = TRUE, factors = FALSE, hline = "-1")

Scree Plot suggest 2-3 factors

Network Plot

library(EGAnet)

EGAnet (version 2.0.5) 

For help getting started, see <https://r-ega.net> 

For bugs and errors, submit an issue to <https://github.com/hfgolino/EGAnet/issues>
EGA(poly, n = 2561, ploy.EGA = TRUE, steps = 10)

Model: GLASSO (EBIC with gamma = 0.5)
Correlations: auto
Lambda: 0.0708977369847798 (n = 100, ratio = 0.1)

Number of nodes: 23
Number of edges: 131
Edge density: 0.518

Non-zero edge weights: 
     M    SD    Min   Max
 0.056 0.087 -0.115 0.401

----

Algorithm:  Walktrap

Number of communities:  4

q01 q02 q03 q04 q05 q06 q07 q08 q09 q10 q11 q12 q13 q14 q15 q16 q17 q18 q19 q20 
  1   2   3   1   1   1   1   4   2   1   4   1   1   1   1   1   4   1   3   3 
q21 q22 q23 
  1   2   2 

----

Unidimensional Method: Louvain
Unidimensional: No

----

TEFI: -18.853

Network graph suggests 4 factors

How many factors?

  • Parallel Analysis : 4
  • MAP: 4
  • Scree Plot: 2-4
  • Network Plot: 4 We will fit 3 EFA models with 2, 3, and 4 factors.

EFA MODELS BELOW

We will use oblimin rotation, which is an oblique rotation method in factor analysis that allows for correlated factors, preserving the original correlations among factors. It estimates inter-factor correlations, providing insights into the underlying structure of the data, particularly useful when factors are expected to be related.

EFA with 2 factors

f2 = fa(poly, nfactors = 2, rotate = "oblimin", residuals = T, SMC = T, fm = "wls", n.obs = 2561)
Loading required namespace: GPArotation
print(f2, sort = TRUE, cut = 0, digits = 2)
Factor Analysis using method =  wls
Call: fa(r = poly, nfactors = 2, n.obs = 2561, rotate = "oblimin", 
    residuals = T, SMC = T, fm = "wls")
Standardized loadings (pattern matrix) based upon correlation matrix
    item  WLS1  WLS2    h2   u2 com
q17   17  0.80  0.22 0.585 0.42 1.1
q11   11  0.77  0.13 0.545 0.46 1.1
q08    8  0.74  0.31 0.514 0.49 1.3
q18   18  0.65 -0.16 0.510 0.49 1.1
q13   13  0.65 -0.11 0.471 0.53 1.1
q07    7  0.64 -0.13 0.479 0.52 1.1
q04    4  0.63 -0.04 0.410 0.59 1.0
q16   16  0.62 -0.16 0.468 0.53 1.1
q01    1  0.61  0.01 0.368 0.63 1.0
q14   14  0.61 -0.15 0.447 0.55 1.1
q12   12  0.58 -0.23 0.468 0.53 1.3
q21   21  0.57 -0.22 0.444 0.56 1.3
q06    6  0.55 -0.05 0.325 0.67 1.0
q05    5  0.55 -0.05 0.315 0.69 1.0
q15   15  0.54 -0.12 0.345 0.65 1.1
q03    3 -0.44  0.41 0.475 0.52 2.0
q10   10  0.42 -0.05 0.191 0.81 1.0
q20   20  0.32 -0.27 0.222 0.78 1.9
q09    9  0.00  0.58 0.339 0.66 1.0
q02    2 -0.08  0.56 0.349 0.65 1.0
q19   19 -0.25  0.40 0.282 0.72 1.7
q22   22 -0.11  0.39 0.194 0.81 1.2
q23   23 -0.02  0.23 0.057 0.94 1.0

                      WLS1 WLS2
SS loadings           6.90 1.90
Proportion Var        0.30 0.08
Cumulative Var        0.30 0.38
Proportion Explained  0.78 0.22
Cumulative Proportion 0.78 1.00

 With factor correlations of 
     WLS1 WLS2
WLS1  1.0 -0.3
WLS2 -0.3  1.0

Mean item complexity =  1.2
Test of the hypothesis that 2 factors are sufficient.

df null model =  253  with the objective function =  9.56 with Chi Square =  24379.88
df of  the model are 208  and the objective function was  1.82 

The root mean square of the residuals (RMSR) is  0.05 
The df corrected root mean square of the residuals is  0.06 

The harmonic n.obs is  2561 with the empirical chi square  3919.95  with prob <  0 
The total n.obs was  2561  with Likelihood Chi Square =  4648.51  with prob <  0 

Tucker Lewis Index of factoring reliability =  0.776
RMSEA index =  0.091  and the 90 % confidence intervals are  0.089 0.094
BIC =  3016.1
Fit based upon off diagonal values = 0.97
Measures of factor score adequacy             
                                                  WLS1 WLS2
Correlation of (regression) scores with factors   0.96 0.86
Multiple R square of scores with factors          0.93 0.73
Minimum correlation of possible factor scores     0.85 0.46

EFA with 2 factors PLOT

fa.diagram(f2, cut = 0, sort = TRUE)

Most variables seem to load onto factor 1

EFA with 3 factors

f3 = fa(poly, nfactors = 3, rotate = "oblimin", residuals = T, SMC = T, fm = "wls", n.obs = 2561)
print(f3, sort = TRUE, cut = 0, digits = 2)
Factor Analysis using method =  wls
Call: fa(r = poly, nfactors = 3, n.obs = 2561, rotate = "oblimin", 
    residuals = T, SMC = T, fm = "wls")
Standardized loadings (pattern matrix) based upon correlation matrix
    item  WLS1  WLS2  WLS3   h2   u2 com
q12   12  0.73 -0.03  0.06 0.51 0.49 1.0
q18   18  0.69  0.08  0.06 0.53 0.47 1.0
q21   21  0.68  0.01  0.03 0.47 0.53 1.0
q14   14  0.65  0.07  0.07 0.47 0.53 1.0
q07    7  0.65  0.11  0.06 0.49 0.51 1.1
q03    3 -0.64  0.04  0.20 0.47 0.53 1.2
q16   16  0.60  0.14 -0.01 0.47 0.53 1.1
q13   13  0.58  0.18  0.02 0.47 0.53 1.2
q04    4  0.54  0.18  0.10 0.42 0.58 1.3
q20   20  0.50 -0.08 -0.06 0.23 0.77 1.1
q05    5  0.48  0.15  0.07 0.32 0.68 1.3
q02    2 -0.47  0.26  0.33 0.33 0.67 2.4
q19   19 -0.47  0.11  0.24 0.28 0.72 1.6
q01    1  0.47  0.23  0.10 0.37 0.63 1.5
q06    6  0.47  0.17  0.04 0.33 0.67 1.3
q15   15  0.46  0.19 -0.05 0.35 0.65 1.3
q10   10  0.32  0.17 -0.03 0.19 0.81 1.6
q08    8  0.06  0.74 -0.03 0.60 0.40 1.0
q17   17  0.19  0.69 -0.05 0.66 0.34 1.2
q11   11  0.22  0.65 -0.12 0.63 0.37 1.3
q23   23  0.18 -0.23  0.71 0.50 0.50 1.4
q22   22 -0.23  0.01  0.41 0.25 0.75 1.5
q09    9 -0.38  0.26  0.41 0.34 0.66 2.7

                      WLS1 WLS2 WLS3
SS loadings           6.17 2.34 1.17
Proportion Var        0.27 0.10 0.05
Cumulative Var        0.27 0.37 0.42
Proportion Explained  0.64 0.24 0.12
Cumulative Proportion 0.64 0.88 1.00

 With factor correlations of 
      WLS1 WLS2  WLS3
WLS1  1.00 0.52 -0.19
WLS2  0.52 1.00  0.00
WLS3 -0.19 0.00  1.00

Mean item complexity =  1.4
Test of the hypothesis that 3 factors are sufficient.

df null model =  253  with the objective function =  9.56 with Chi Square =  24379.88
df of  the model are 187  and the objective function was  1.45 

The root mean square of the residuals (RMSR) is  0.05 
The df corrected root mean square of the residuals is  0.06 

The harmonic n.obs is  2561 with the empirical chi square  2979.91  with prob <  0 
The total n.obs was  2561  with Likelihood Chi Square =  3689  with prob <  0 

Tucker Lewis Index of factoring reliability =  0.803
RMSEA index =  0.086  and the 90 % confidence intervals are  0.083 0.088
BIC =  2221.4
Fit based upon off diagonal values = 0.98
Measures of factor score adequacy             
                                                  WLS1 WLS2 WLS3
Correlation of (regression) scores with factors   0.96 0.91 0.84
Multiple R square of scores with factors          0.91 0.82 0.71
Minimum correlation of possible factor scores     0.83 0.65 0.41

EFA with 3 factors PLOT

fa.diagram(f3, cut = 0, sort = TRUE)

EFA with 4 factors

f4 = fa(poly, nfactors = 4, rotate = "oblimin", residuals = T, SMC = T, fm = "wls", n.obs = 2561)
print(f4, sort = TRUE, cut = 0, digits = 2)
Factor Analysis using method =  wls
Call: fa(r = poly, nfactors = 4, n.obs = 2561, rotate = "oblimin", 
    residuals = T, SMC = T, fm = "wls")
Standardized loadings (pattern matrix) based upon correlation matrix
    item  WLS1  WLS4  WLS3  WLS2   h2   u2 com
q21   21  0.60  0.05  0.07 -0.10 0.50 0.50 1.1
q04    4  0.56  0.16  0.07  0.05 0.47 0.53 1.2
q01    1  0.54  0.22  0.00  0.08 0.44 0.56 1.4
q16   16  0.53  0.16  0.06 -0.09 0.51 0.49 1.3
q12   12  0.53 -0.02  0.25 -0.09 0.51 0.49 1.5
q20   20  0.48  0.04 -0.16 -0.21 0.31 0.69 1.6
q05    5  0.47  0.12  0.11  0.04 0.35 0.65 1.3
q03    3 -0.43 -0.10  0.04  0.39 0.52 0.48 2.1
q08    8 -0.01  0.89 -0.06  0.07 0.73 0.27 1.0
q11   11 -0.03  0.79  0.07 -0.12 0.72 0.28 1.1
q17   17  0.12  0.71  0.06  0.02 0.64 0.36 1.1
q06    6 -0.10  0.03  0.87  0.01 0.71 0.29 1.0
q18   18  0.34  0.00  0.54 -0.04 0.60 0.40 1.7
q07    7  0.31  0.04  0.50 -0.04 0.55 0.45 1.7
q13   13  0.20  0.15  0.47 -0.08 0.52 0.48 1.6
q14   14  0.37  0.03  0.38 -0.05 0.48 0.52 2.0
q10   10  0.03  0.12  0.37 -0.06 0.23 0.77 1.3
q15   15  0.16  0.20  0.28 -0.14 0.35 0.65 3.1
q09    9  0.03  0.07 -0.03  0.63 0.38 0.62 1.0
q02    2 -0.14  0.04  0.06  0.58 0.38 0.62 1.1
q22   22  0.18 -0.11 -0.13  0.51 0.28 0.72 1.5
q19   19 -0.19 -0.03 -0.04  0.39 0.28 0.72 1.5
q23   23  0.23 -0.14 -0.04  0.36 0.13 0.87 2.0

                      WLS1 WLS4 WLS3 WLS2
SS loadings           3.53 2.63 2.59 1.81
Proportion Var        0.15 0.11 0.11 0.08
Cumulative Var        0.15 0.27 0.38 0.46
Proportion Explained  0.33 0.25 0.25 0.17
Cumulative Proportion 0.33 0.58 0.83 1.00

 With factor correlations of 
      WLS1  WLS4  WLS3  WLS2
WLS1  1.00  0.54  0.48 -0.39
WLS4  0.54  1.00  0.47 -0.21
WLS3  0.48  0.47  1.00 -0.30
WLS2 -0.39 -0.21 -0.30  1.00

Mean item complexity =  1.5
Test of the hypothesis that 4 factors are sufficient.

df null model =  253  with the objective function =  9.56 with Chi Square =  24379.88
df of  the model are 167  and the objective function was  0.69 

The root mean square of the residuals (RMSR) is  0.03 
The df corrected root mean square of the residuals is  0.04 

The harmonic n.obs is  2561 with the empirical chi square  1097.32  with prob <  1.4e-136 
The total n.obs was  2561  with Likelihood Chi Square =  1770.78  with prob <  1.1e-265 

Tucker Lewis Index of factoring reliability =  0.899
RMSEA index =  0.061  and the 90 % confidence intervals are  0.059 0.064
BIC =  460.14
Fit based upon off diagonal values = 0.99
Measures of factor score adequacy             
                                                  WLS1 WLS4 WLS3 WLS2
Correlation of (regression) scores with factors   0.92 0.94 0.92 0.85
Multiple R square of scores with factors          0.85 0.88 0.85 0.73
Minimum correlation of possible factor scores     0.70 0.76 0.69 0.45

The root mean square of the residuals (RMSR) for model with 4 factors: 0.03

EFA with 4 factors PLOT

fa.diagram(f4, cut = 0, sort = TRUE)

Summary of RMSR values:

  • 2 factor model: 0.05
  • 3 factor model: 0.05
  • 4 factor mdoel: 0.03

The smaller the RMSR (Root Mean Squared Residual) value, the better. RMSR is a measure of overall residual misfit. RSMR values less than .08 are preferred (Brown, 2013). RMSR results suggest that the last model, with 4 factors is most suitable to fit the data.